A real language model — small enough to live in this page. Give it words, train it with your own hands, and watch it learn to spell. Everything below runs live, on your device.
it's a bigram model — the simplest kind of language model, predating transformers by decades. it looks at one letter to guess the next. real LLMs are transformers; for that, see the Tiny Transformer Lab.
Everything it will ever know. Edit it, or pick a starter set, then press Load.
appends to the corpus above and keeps training — add a few, train more, watch it shift.
Each step nudges the weights downhill to make the real next letter less surprising. Watch the loss fall.
Start at the boundary, sample a letter, repeat until it stops. Low temperature plays it safe; high gets weird.
it only remembers the last letter you gave it, so it riffs more than it answers — that's what makes it a bigram, not a chatbot.
The model is this grid. Row = the current letter, column = the next. Brighter means “more likely.” Watch patterns sharpen as it trains.
click any row to inspect that letter in “Follow the math” below.
Every training step runs this exact little pipeline for each letter. Pick one and watch text turn into a vector, the vector pick a row of the matrix, and a guess fall out — the strips update live as the loss drops.
a single 1 in its own slot, 0 everywhere else — one cell per letter (a…z, then “end”).
one score per possible next letter. teal = the model leans toward it, red = leans away. (this row is the highlighted stripe in the heatmap above.)
now they add to 100%. the model's top guess after “{{ flowInput }}” is “{{ flowTop }}”.
gradient descent adjusts this row so the real next letter's bar grows and the surprise shrinks. press Train and watch the teal stripe in step 3 swell under “{{ flowTarget }}.”
Real models get tuned by human preference: people rate outputs, and training makes the liked ones more likely. Here's a toy version — sample some words, give a ▲ or ▼, then apply it. The model literally re-weights the letter-pairs in the words you rated.
after applying, watch the heatmap shift and re-sample — liked spellings get more likely, disliked ones fade. that's preference tuning in miniature (real RLHF is fancier, but the spirit is this).
That's a language model. Real training, real sampling — just very, very small.
A frontier LLM is this same loop, with billions of weights, a context of thousands of tokens, and the whole internet as its corpus. The math under your fingertips here is the math under all of it.